Search CORE

61 research outputs found

Extracting Reference voltages from measurement voltages for oil-water two-phase flow measurement of electrical impedance tomography

Author: Jia Jiabin
Wan Xingchen
Yu Hao
Publication venue: 'Elsevier BV'
Publication date: 01/03/2023
Field of study

Working Memory Capacity of ChatGPT: An Empirical Study

Author: Gong Dongyu
Wan Xingchen
Wang Dingmin
Publication venue
Publication date: 17/06/2023
Field of study

Working memory is a critical aspect of both human intelligence and artificial intelligence, serving as a workspace for the temporary storage and manipulation of information. In this paper, we systematically assess the working memory capacity of ChatGPT (gpt-3.5-turbo), a large language model developed by OpenAI, by examining its performance in verbal and spatial n-back tasks under various conditions. Our experiments reveal that ChatGPT experiences significant declines in performance as n increases (which necessitates more information to be stored in working memory), suggesting a limit to the working memory capacity strikingly similar to that of humans. Furthermore, we investigate the impact of different instruction strategies on ChatGPT's performance and observe that the fundamental patterns of a capacity limit persist. From our empirical findings, we propose that n-back tasks may serve as tools for benchmarking the working memory capacity of large language models and hold potential for informing future efforts aimed at enhancing AI working memory and deepening our understanding of human working memory through AI models.Comment: 19 pages, 21 figures, 10 table

arXiv.org e-Print Archive

Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning

Author: Korhonen Anna
Vulić Ivan
Wan Xingchen
Zhou Han
Publication venue
Publication date: 19/10/2023
Field of study

Prompt-based learning has been an effective paradigm for large pretrained language models (LLM), enabling few-shot or even zero-shot learning. Black-box prompt search has received growing interest recently for its distinctive properties of gradient-free optimization, proven particularly useful and powerful for model-as-a-service usage. However, the discrete nature and the complexity of combinatorial optimization hinder the efficiency of modern black-box approaches. Despite extensive research on search algorithms, the crucial aspect of search space design and optimization has been largely overlooked. In this paper, we first conduct a sensitivity analysis by prompting LLM, revealing that only a small number of tokens exert a disproportionate amount of influence on LLM predictions. Leveraging this insight, we propose the Clustering and Pruning for Efficient Black-box Prompt Search (ClaPS), a simple black-box search method that first clusters and prunes the search space to focus exclusively on influential prompt tokens. By employing even simple search methods within the pruned search space, ClaPS achieves state-of-the-art performance across various tasks and LLMs, surpassing the performance of complex approaches while significantly reducing search costs. Our findings underscore the critical role of search space design and optimization in enhancing both the usefulness and the efficiency of black-box prompt-based learning.Comment: Findings of EMNLP 2023. 10 pages, 5 figures, 4 tables (14 pages, 5 figures, 8 tables including references and appendices

arXiv.org e-Print Archive

Explaining the Adaptive Generalisation Gap

Author: Albanie Samuel
Granziol Diego
Roberts Stephen
Wan Xingchen
Publication venue
Publication date: 26/07/2021
Field of study

We conjecture that the inherent difference in generalisation between adaptive and non-adaptive gradient methods stems from the increased estimation noise in the flattest directions of the true loss surface. We demonstrate that typical schedules used for adaptive methods (with low numerical stability or damping constants) serve to bias relative movement towards flat directions relative to sharp directions, effectively amplifying the noise-to-signal ratio and harming generalisation. We further demonstrate that the numerical stability/damping constant used in these methods can be decomposed into a learning rate reduction and linear shrinkage of the estimated curvature matrix. We then demonstrate significant generalisation improvements by increasing the shrinkage coefficient, closing the generalisation gap entirely in both Logistic Regression and Deep Neural Network experiments. Finally, we show that other popular modifications to adaptive methods, such as decoupled weight decay and partial adaptivity can be shown to calibrate parameter updates to make better use of sharper, more reliable directions

arXiv.org e-Print Archive

Explore Bristol Research

Estimation of Reference Voltages for Time-difference Electrical Impedance Tomography

Author: Dong Zhongxu
Jia Jiabin
Wan Xingchen
Yu Hao
Zhang Zhixi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/10/2022
Field of study

Edinburgh Research Explorer

Iterative Averaging in the Quest for Best Test Error

Author: Albane Samuel
Baskerville Nick P
Granziol Diego
Roberts Stephen
Wan Xingchen
Publication venue
Publication date: 31/10/2021
Field of study

We analyse and explain the increased generalisation performance of iterate averaging using a Gaussian process perturbation model between the true and batch risk surface on the high dimensional quadratic. We derive three phenomena \latestEdits{from our theoretical results:} (1) The importance of combining iterate averaging (IA) with large learning rates and regularisation for improved regularisation. (2) Justification for less frequent averaging. (3) That we expect adaptive gradient methods to work equally well, or better, with iterate averaging than their non-adaptive counterparts. Inspired by these results\latestEdits{, together with} empirical investigations of the importance of appropriate regularisation for the solution diversity of the iterates, we propose two adaptive algorithms with iterate averaging. These give significantly better results compared to stochastic gradient descent (SGD), require less tuning and do not require early stopping or validation set monitoring. We showcase the efficacy of our approach on the CIFAR-10/100, ImageNet and Penn Treebank datasets on a variety of modern and classical network architectures

arXiv.org e-Print Archive

Explore Bristol Research

BOiLS: Bayesian Optimisation for Logic Synthesis

Author: Ammar Haitham Bou
Grosnit Antoine
Malherbe Cedric
Tutunov Rasul
Wan Xingchen
Wang Jun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/11/2021
Field of study

Optimising the quality-of-results (QoR) of circuits during logic synthesis is a formidable challenge necessitating the exploration of exponentially sized search spaces. While expert-designed operations aid in uncovering effective sequences, the increase in complexity of logic circuits favours automated procedures. To enable efficient and scalable solvers, we propose BOiLS, the first algorithm adapting Bayesian optimisation to navigate the space of synthesis operations. BOiLS requires no human intervention and trades-off exploration versus exploitation through novel Gaussian process kernels and trust-region constrained acquisitions. In a set of experiments on EPFL benchmarks, we demonstrate BOiLS's superior performance compared to state-of-the-art in terms of both sample efficiency and QoR values

arXiv.org e-Print Archive

UCL Discovery